Search CORE

10 research outputs found

Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

Author: Bal Henri
Heldens Stijn
Hijma Pieter
Maassen Jason
van Nieuwpoort Rob
van Werkhoven Ben
Publication venue
Publication date: 01/01/2020
Field of study

All-pairs compute problems apply a user-defined function to each combination of two items of a given data set. Although these problems present an abundance of parallelism, data reuse must be exploited to achieve good performance. Several researchers considered this problem, either resorting to partial replication with static work distribution or dynamic scheduling with full replication. In contrast, we present a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization. We evaluate our solution using three real-world applications (from digital forensics, localization microscopy, and bioinformatics) on different platforms (from a desktop machine to a supercomputer). Results shows excellent efficiency and scalability when scaling to 96 GPUs, even obtaining super-linear speedups due to a distributed cache

arXiv.org e-Print Archive

VU Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

The LDBC Graphalytics Benchmark

Author: Anderson Michael
Boncz Peter
Capotă Mihai
Chafi Hassan
Depner Siegfried
Hegeman Tim
Heldens Stijn
Iosup Alexandru
Manhardt Thomas
Musaafir Ahmed
Nai Lifeng
Ngai Wing Lung
Pérez Arnau Prat
Sundaram Narayanan
Szárnyas Gábor
Tănase Ilie Gabriel
Uta Alexandru
Xia Yinglong
Publication venue
Publication date: 15/02/2023
Field of study

In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report

arXiv.org e-Print Archive

LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms

Author: Alexandru Iosup
Arnau Prat-Pérez
Gabriel Tȃnase
Hassan Chafi
Michael Anderson
Mihai Capotȃ
Nai ⊕ Peter Boncz
Narayanan Sundaram
Ngai △ Stijn Heldens
Thomas Manhardt
Tim Hegeman
Wing Lung
Yinglong Xia
⊗ Lifeng
⊙ Ilie
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms

CiteSeerX

Optimization Techniques for GPU Programming

Author: Bal Henri E.
Heldens Stijn
Hijma Pieter
Sclocco Alessio
Van Werkhoven Ben
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2023
Field of study

In the past decade, Graphics Processing Units have played an important role in the field of high-performance computing and they still advance new fields such as IoT, autonomous vehicles, and exascale computing. It is therefore important to understand how to extract performance from these processors, something that is not trivial. This survey discusses various optimization techniques found in 450 articles published in the last 14 years. We analyze the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning

VU Research Portal

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

Author: Heldens Stijn
Hijma Pieter
Maassen Jason
Van Nieuwpoort Rob V.
Van Werkhoven Ben
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

The GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale GPU applications further, a great engineering effort is typically required: work and data must be divided over multiple GPUs by hand, possibly in multiple nodes, and data must be manually spilled from GPU memory to higher-level memories. We present Lightning: a framework that follows the common GPU programming paradigm but enables scaling to large problems with ease. Lightning supports multi-GPU execution of GPU kernels, even across multiple nodes, and seamlessly spills data to higher-level memories (main memory and disk). Existing CUDA kernels can easily be adapted for use in Lightning, with data access annotations on these kernels allowing Lightning to infer their data requirements and the dependencies between subsequent kernel launches. Lightning efficiently distributes the work/data across GPUs and maximizes efficiency by overlapping scheduling, data movement, and kernel execution when possible. We present the design and implementation of Lightning, as well as experimental results on up to 32 GPUs for eight benchmarks and one real-world application. Evaluation shows excellent performance and scalability, such as a speedup of 57.2 x over the CPU using Lighting with 16 GPUs over 4 nodes and 80 GB of data, far beyond the memory capacity of one GPU

arXiv.org e-Print Archive

VU Research Portal

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Rocket: Efficient and scalable all-pairs computations on heterogeneous platforms

Author: Bal Henri
Heldens Stijn
Hijma Pieter
Maassen Jason
Nieuwpoort Rob Van
Werkhoven Ben Van
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/02/2021
Field of study

All-pairs compute problems apply a user-defined function to each combination of two items of a given data set. Although these problems present an abundance of parallelism, data reuse must be exploited to achieve good performance. Several researchers considered this problem, either resorting to partial replication with static work distribution or dynamic scheduling with full replication. In contrast, we present a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization. We evaluate our solution using three real-world applications, from digital forensics, localization microscopy, and bioinformatics, on different platforms, from desktop machine to a supercomputer. Results shows excellent efficiency and scalability when scaling to 96 GPUs, even obtaining super-linear speedups due to a distributed cache

VU Research Portal

The Landscape of Exascale Research:A Data-Driven Literature Analysis

Author: Belloum Adam S.Z.
Heldens Stijn
Hijma Pieter
Maassen Jason
Van Nieuwpoort Rob V.
Werkhoven Ben Van
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2020
Field of study

The next generation of supercomputers will break the exascale barrier. Soon we will have systems capable of at least one quintillion (billion billion) floating-point operations per second (1018 FLOPS). Tremendous amounts of work have been invested into identifying and overcoming the challenges of the exascale era. In this work, we present an overview of these efforts and provide insight into the important trends, developments, and exciting research opportunities in exascale computing. We use a three-stage approach in which we (1) discuss various exascale landmark studies, (2) use data-driven techniques to analyze the large collection of related literature, and (3) discuss eight research areas in depth based on influential articles. Overall, we observe that great advancements have been made in tackling the two primary exascale challenges: energy efficiency and fault tolerance. However, as we look forward, we still foresee two major concerns: the lack of suitable programming tools and the growing gap between processor performance and data bandwidth (i.e., memory, storage, networks). Although we will certainly reach exascale soon, without additional research, these issues could potentially limit the applicability of exascale computing

VU Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

litstudy: A Python package for literature reviews

Author: Dreuning Henk
Heldens Stijn
Hijma Pieter
Maassen Jason
Sclocco Alessio
van Nieuwpoort Rob V.
van Werkhoven Ben
Publication venue: 'Elsevier BV'
Publication date: 01/12/2022
Field of study

Researchers are often faced with exploring new research domains. Broad questions about the research domain, such as who are the influential authors or what are important topics, are difficult to answer due to the overwhelming number of relevant publications. Therefore, we present litstudy: a Python package that enables answering such questions using simple scripts or Jupyter notebooks. The package enables selecting scientific publications and studying their metadata using visualizations, bibliographic network analysis, and natural language processing. The software was previously used in a publication on the landscape of Exascale computing, and we envision great potential for reuse

VU Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms

Author: Anderson Michael J.
Boncz Peter A.
Capota Mihai
Chafi Hassan
Hegeman Tim
Heldens Stijn
Iosup Alexandru
Manhardt Thomas
Nai Lifeng
Ngai Wing Lung
Prat-Pérez Arnau
Sundaram Narayanan
Tanase Ilie Gabriel
Xia Yinglong
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2016
Field of study

In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms

VU Research Portal

CWI's Institutional Repository